翻訳と辞書
Words near each other
・ Norclostebol
・ Norco
・ Norco Bicycles
・ Norco Co-operative
・ Norco College
・ Norco High School
・ Norco shootout
・ Norco, California
・ Norco, Louisiana
・ Norcocaine
・ Norcodeine
・ NorCom
・ Norcon pillbox
・ Norcon Sociedade Nordestina de Construções S/A
・ Norconex
Norconex HTTP Collector
・ Norconsult
・ Norcot
・ Norcroft C compiler
・ Norcross
・ Norcross (surname)
・ Norcross Brothers
・ Norcross Brothers Houses
・ Norcross Building
・ Norcross High School
・ Norcross Wildlife Sanctuary
・ Norcross, Georgia
・ Norcross, Minnesota
・ Nord
・ Nord (Chamber of Deputies of Luxembourg constituency)


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Norconex HTTP Collector : ウィキペディア英語版
Norconex HTTP Collector

Norconex HTTP Collector is a web spider, or crawler initially created for Enterprise Search integrators and developers. It began as a closed source project developed by Norconex. It was released as open source in 2013.〔() Source Code〕〔() Beyond Search 1 〕〔() Beyond Search 2〕〔() Big Data Made Simple〕〔() Apache Solr Ecosystem〕
==Architecture==
Norconex HTTP Collector was built entirely using Java. A single Collector installation is responsible for launching one or multiple crawler threads, each with their own configuration.
File:Norconex HTTP Collector Diagram.png
Each step is part of a crawler life-cycle is configurable and overwritable. Developers can provide their own interface implementation for most steps undertaken by the crawler. The default implementations provided cover a vast array of crawling use cases, and are built on stable products such as Apache Tika and Apache Derby. The following figure is a high level representation of a URL-life-cycle from the crawler perspective.
Norconex HTTP Collector URL-Life-Cycle
The Importer and Committer modules are separate Apache licensed java libraries distributed with the Collector.
The Importer module parses incoming document from their raw form (HTML, PDF, Word, etc) to a set of extracted metadata and plain text content. In addition, it provides interfaces to manipulate a document metadata, transform its content, or simply filter the documents based on their new format. While the Collector is heavily dependent on the Importer module, the later can be used on its own, as a general-purpose document parser.
The committer module is responsible for directing the parsed data to a target repository of choice. Developers are able to write custom implementations, allowing the use of Norconex HTTP Collector with any search engines or repositories. Two committer implementations currently exists, for Apache Solr and Elastic Search.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Norconex HTTP Collector」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.